Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
translated by 谷歌翻译
多摄像机多对象跟踪目前在计算机视野中引起了注意力,因为它在现实世界应用中的卓越性能,如具有拥挤场景或巨大空间的视频监控。在这项工作中,我们提出了一种基于空间升降的多乳制型配方的数学上优雅的多摄像多对象跟踪方法。我们的模型利用单摄像头跟踪器产生的最先进的TOOTWLET作为提案。由于这些Tracklet可能包含ID-Switch错误,因此我们通过从3D几何投影获得的新型预簇来完善它们。因此,我们派生了更好的跟踪图,没有ID交换机,更精确的数据关联阶段的亲和力成本。然后通过求解全局提升的多乳制型制剂,将轨迹与多摄像机轨迹匹配,该组件包含位于同一相机和相互相机间的Tracklet上的短路和远程时间交互。在Wildtrack DataSet的实验结果是近乎完美的结果,在校园上表现出最先进的追踪器,同时在PETS-09数据集上处于校准状态。我们将在接受纸质时进行我们的实施。
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们提出了一个数据收集和注释管道,该数据从越南放射学报告中提取信息,以提供胸部X射线(CXR)图像的准确标签。这可以通过注释与其特有诊断类别的数据相匹配,这些数据可能因国家而异。为了评估所提出的标签技术的功效,我们构建了一个包含9,752项研究的CXR数据集,并使用该数据集的子集评估了我们的管道。以F1得分为至少0.9923,评估表明,我们的标签工具在所有类别中都精确而始终如一。构建数据集后,我们训练深度学习模型,以利用从大型公共CXR数据集传输的知识。我们采用各种损失功能来克服不平衡的多标签数据集的诅咒,并使用各种模型体系结构进行实验,以选择提供最佳性能的诅咒。我们的最佳模型(CHEXPERT-FRECTER EDIDENENET-B2)的F1得分为0.6989(95%CI 0.6740,0.7240),AUC为0.7912,敏感性为0.7064,特异性为0.8760,普遍诊断为0.8760。最后,我们证明了我们的粗分类(基于五个特定的异常位置)在基准CHEXPERT数据集上获得了可比的结果(十二个病理),以进行一般异常检测,同时在所有类别的平均表现方面提供更好的性能。
translated by 谷歌翻译
这项研究介绍了我们对越南语言和语音处理任务(VLSP)挑战2021的文本处理任务的医疗保健领域的自动越南图像字幕的方法作为编码器的体系结构和长期的短期内存(LSTM)作为解码器生成句子。这些模型在不同的数据集中表现出色。我们提出的模型还具有编码器和一个解码器,但是我们在编码器中使用了SWIN变压器,LSTM与解码器中的注意模块结合在一起。该研究介绍了我们在比赛期间使用的培训实验和技术。我们的模型在vietcap4h数据集上达到了0.293的BLEU4分数,并且该分数在私人排行榜上排名3 $^{rd} $。我们的代码可以在\ url {https://git.io/jddjm}上找到。
translated by 谷歌翻译
无人驾驶汽车(UAV)在许多领域都受雇于摄影,紧急,娱乐,国防,农业,林业,采矿和建筑。在过去的十年中,无人机技术在许多施工项目阶段中找到了应用程序,从现场映射,进度监控,建筑物检查,损坏评估和材料交付等等。尽管已经对无人机在各种施工相关的过程中的优势进行了广泛的研究,但关于提高任务能力和效率的无人机协作的研究仍然很少。本文提出了一种基于塔格狩猎游戏和粒子群优化(PSO)的多个无人机的新合作路径计划算法。首先,定义了每个无人机的成本函数,并包含多个目标和约束。然后,开发了无人机游戏框架,以将多功能路径计划制定到寻找回报优势均衡的问题。接下来,提出了基于PSO的算法来获得无人机的最佳路径。由三个无人机检查的大型建筑工地的仿真结果表明,在检查任务期间,提出的算法在为无人机形成的可行和高效飞行路径生成可行,高效的飞行路径上的有效性。
translated by 谷歌翻译
算法追索权旨在推荐提供丰富的反馈,以推翻不利的机器学习决策。我们在本文中介绍了贝叶斯追索权,这是一种模型不足的追索权,可最大程度地减少后验概率比值比。此外,我们介绍了其最小的稳健对应物,目的是对抗机器学习模型参数的未来变化。强大的对应物明确考虑了使用最佳传输(Wasserstein)距离规定的高斯混合物中数据的扰动。我们表明,可以将最终的最差目标函数分解为求解一系列二维优化子问题,因此,最小值追索问题发现问题可用于梯度下降算法。与现有的生成健壮的回流的方法相反,可靠的贝叶斯追索不需要线性近似步骤。数值实验证明了我们提出的稳健贝叶斯追索权面临模型转移的有效性。我们的代码可在https://github.com/vinairesearch/robust-bayesian-recourse上找到。
translated by 谷歌翻译
排队系统出现在许多重要的现实生活应用中,包括通信网络,运输和制造系统。加固学习(RL)框架是排队控制问题的合适模型,在该问题中,基础动力通常未知,并且代理几乎没有从环境中接收到导航的信息。在这项工作中,我们将排队模型作为RL环境的优化方面进行了研究,并提供了有效学习最佳政策的见解。我们通过使用排队网络系统的固有属性来提出策略的新参数化。实验显示了我们的方法的良好性能,从轻度到繁忙的交通状况各种负载条件。
translated by 谷歌翻译
随机梯度下降(SGD)算法是许多机器学习任务中选择的方法,这要归功于其在处理大规模问题方面的可扩展性和效率。在本文中,我们专注于与主流实践启发式符合SGD的改组版。我们将收敛性与过度参数化设置下的一类非凸功率函数的全局解决方案展示为全局解决方案。与以前的文献相比,我们的分析采用更轻松的非凸假设。然而,我们保持了所需的计算复杂性,因为改组SGD在一般凸设置中已实现。
translated by 谷歌翻译